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1. Introduction 


The overall objectives of this research work are to formulate 
and validate efficient parallel algorithms, and to efficiently 
design/implement computer software for solving large-scale 
acoustic problems, arised from the unified frameworks of the 
finite element procedures. 

The adopted parallel Finite Element (FE) Domain Decomposition (DD) 
procedures should fully take advantages of multiple processing 
capabilities offered by most modem high performance computing 
platforms for efficient parallel computation. To achieve this 
objective, the formulation needs to integrate efficient sparse 
(and dense) assembly techniques (see Section 2), hybrid (or mixed) 
direct and iterative equation solvers (see Section 3), proper 
preconditioned strategies, unrolling strategies [Ref. 6.5, Chapter 10], 
and effective processors' communicating schemes (see Section 3). 

Finally, the numerical performance of the developed parallel 
finite element procedures will be evaluated by solving series of 
structural, and acoustic (symmetrical and un_symmetrical) problems 
(in different computing platforms). Comparisons with existing 
"commercialized" [Ref. 6.10] and/or "public domain" [Ref. 6.1 1] 
software are also included, whenever possible. 
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2. Algorithms and Application of Sparse Matrix Assembly and 
Equation Solvers for Aeroacoustics [Ref. 6.1] 

(Please refer to the attached Journal Paper for this section) 
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Nomenclature 

= global stiffness matrix and load vector 
without source effects 
= global stiffness matrix and loads vector 
w ith source effects 

= local element matrices for a rigid wail duct 
- one-dimensional arrays containing sparse 
matrix coefficients 

= element stiffness matrix and loads vector 
= complex matrix coefficients 
= contributions to the element stiffness 
matrix due to the exit plane and 
interior elements 
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diagonal and unit lower triangular matrices 

d, 

= 

value of field variable at local node / 

Id 

= 

element degree-of- freedom matrix 

(EM 

= 

discrete error vector 

E, . V,„ 

= 

error and three-dimensional basis functions 

F 

= 

fill in resulting from matrix factorization 

(Eh 

= 

/ th component of ( F ) 

( E i V ) | 

= 

( F] vector of length N 

/a 

— 

source frequency and free 
space wave number 

H. L. \\ 

— 

height, length, and width of 
three-dimensional duct 

|//.-l|. \ha\ 

= 

matrix of pointers for sparse assembly 

l HMu 


coefficient of the / th row and ith column 
of \HA\ 

l HMK. A/)| 
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[HA | an N x M matrix 

/». /. w 


height, length, and width of 
three-dimensional finite element 

(Ml 


array containing the number of nonzeroes 
per row 

(/C). (/£}. [IET] 


arrays of starting locations of nonzero 
coefficients 

i/n.i/n 


permutation and inverse permutation 
vectors 

m. UC) 

= 

array of row and column indexes for 



nonzero matrix coefficients 

_ i 

[JA I 

= 

V * 

array containing the column numbers 
of the nonzero off-diagonal 
matrix coefficients 

[JE | 
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array of element connectivities 

{JET) 


array of element numbers connected 
to each degree of freedom 
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number of nonzero coefficients in a sparse 
matrix. N + N\ 

maximum number of elements connected 
to a degree of freedom 
array of element numbers connected to a 
degree of freedom 

array containing the number of elements 
connected to a degree of freedom 
master degree-of-freedom array 
maximum number of nonzero coefficients 
per row 

number of unknowns in the finite element 
discretization 

number of finite elements and degrees of 
freedom per element 

number of fill ins during factorization of a 
matrix 

total number of transverse, span wise, and 
axial nodes 

number of nonzero, off-diagonal 
coefficients before factorization 
number of nonzero, off-diagonal 
coefficients after factorization 
acoustic pressure field 
acoustic pressure at local node n: and 
source pressure 

relative error norm and uniform fiou speed 

surface and volume of a finite element 

nonzero value that was modined during 

matrix factorization 

Cartesian coordinates 

transverse, span wise, and axial locations 

of grid lines 

dimensionless exit admittance 
derivative of the acoustic pressure 
normal to a surface 

dimensionless wall and exit impedance 
global vector of unknowns 
local vectors ot unknowns 
intermediate vectors for forward and 
backward substitution 
vector components 
gradient vector and Laplace operator 
vector dot product 


exit and source plane index 
factored and reordered matrix 
forward and backward substitution 
row and column index of a matrix 


truss element number 
grid line locator for three-dimensional duct 
matrix or vector transpose 
complex conjugate 

I. Introduction 

T HREE-DIMENSIONAL aeroacoustics codes that can accu- 
rately predict the noise radiated from commercial aircraft are 
needed. ! Currently, noise prediction codes require the use of a linear 
equation solver before radiated noise can be predicted. .An optimizer 
must then run the noise predictive code on a digital computer hun- 
dreds of times to achieve an aircraft design with a minimal noise 
radiation signature. 

Currently, industry and government aircraft noise predictive 
codes are either two-dimensional or treat only axisymmetric noise 
signatures. 1 When the volumes arc three-dimensional, the currently 
used equation solvers require an excessive amount of CPU time 
and RAM for their assembly and solution. This excessive CPU time 
and computer storage restricts aircraft noise prediction codes to 
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low-frequency sound sources in two-dimensional or axisymmetrr 
environments. 

Sparse equation solving technologies^ 14 have been developed 
and are well documented for several engineering applications, -n : 
the computational advantage of sparse solver technology ovtr 
more conventional technologies (such as band or skyline soi 
has been demonstrated. In addition, for practical engineering id- 
eations. system matrix equations must be developed for an unstruc- 
tured grid to which boundary conditions are often difficult to app:\ 
The finite element method is the simplest for generating the sxsterr. 
matrix on an unstructured grid. 

Only recently have sparse solver technologies been applied to 
aeroacoustics. 1 ‘ 1? In Ref. 1. several direct and iterative equation 
solvers were evaluated to determine their applicability to two- 
dimensional duct aeroacoustics computations with the direct sparse 
solver emerging as the most promising. In Ref. 15. sparse mv . - 
equation solving methodology was extended to three-dimens i 
acoustically lined ducts. However, the work presented in Re: .5 
adopted the assembly strategy that is currently available in the lit- 
erature for assembling system sparse matrix equations. This simple 
but inefficient assembly strategy precludes the use of sparse sobers 
for three-dimensional aeroacoustic computations. 15 

Most, if not all. major codes for analysis and optimal design al- 
low users to select either iterative or direct equation solver*. For 
nacelle aeroacoustics computations, iterative solvers are not a< ro- 
bust as direct solvers because the nacelle equation system is poor K 
conditioned. 1 Iterative solution methods, when applied to system' 
poorly conditioned equations, have the disadvantage that they do . 
converge, or they converge very slowly. A further disadvantage o: 
applying iterative solution methods to solve the nacelle equation *> 
tem is that the nacelle equation system often contains multiple right- 
hand sides, heratix c methods are not as efficient as direct method* on 
equation systems with multiple right-hand sides because the equa- 
tion system must be reformed and resolved for each right-hand *ide. 

The long-term objective of this research is to acquire the capabil- 
ity to design quiet aircraft in a fully three-dimensional aeroacoustic 
environment using direct sparse solver technologies and the finite 
dement methodogy. The current paper has iwo objectives. The 
objective is to bridge the gap between the aeroacoustic ian* < v- 
may not have a comprehensive know ledge of sparse assembh and 
equation solver technologies) and members of the sparse research 
community (who may not have comprehensive knowledge of fi- 
nite element analysis and aeroacoustics). The second objective to 
present efficient algorithms for assembling sparse matrix equations. 

Section II describes three sparse assembly algorithms for gener- 
ating systems of sparse linear equations. Section III describe* the 
template that is used to develop a complete, unstructured grid, fi- 
nite element code, that is. equation reordering, symbolic/numencal 
factorization, supemode s/loop unrolling, and forward/back ware ' - 
lution phases. Section IV presents a detailed formulation of the *. 
ement stiffness matrices that will be assembled using the sparse as- 
sembly algorithms to form the system matrix fora three-dimensional 
duct aeroacoustics application. Finally. Sec. V discusses the accu- 
racy and numerical performance of the developed algorithms over 
the frequency range of interest for a three-dimensional aeroacoustics 
application. Note that although the sparse algorithms presented as- 
sume that the system matix equation is symmetric, these algorithms 
are easily extendible to nonsymmeinc systems of equation*. The 
algorithms can also be conveniently incorporated into a substruc- 
tunng (or domain decomposition) formulation to take advantage 
parallel computation to further reduce CPU time and RAM. 

II. Sparse Assembly Algorithms 
for Symmetric Systems 

Figure 1 is a two-dimensional truss (or rod) structure assembled 
from individual truss elements labeled (1), (2),..., (13) that are 
interconnected at eight nodes labeled 1 , 2, .... 8. An element ( e ) of 
the structure is assumed to possess only two points of connection, 
and the external loads are assumed to be applied at the nodes of the 
truss elements. Only a single degree of freedom (DOF) at each node 
is assumed. To further simplify discussions, it is assumed that, by a 
separate calculation, the dement stiffness matrix and external load 
vector for the truss element (e) are known and expressed as 


1 >:/ j 


l^’j 


Under the assumption of Eq. (1), the 13 truss elements (Fig. 1) 
may be assembled using the rules of finite element assembly 16 to 
obtain the system matrix equation 
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iO(,V)| = 



Although only a single DOF \ ? ns assumed at node / . the di$~ 
^'Mon to to How is easily extended to a DOF per node by extending 
: - coefficients in [A]. that is. A'jj. loq x q submatnces. The rules 
’‘tutrix algebra would then oe applied to each q :< q submatrix as 
- vv ere a scalar. 


Sparse Data Formats for the System Matrix 

F°r the sake of brevity, m the discussions to follow it will be 
turned that the element stiffness matrix is symmetric so that 



K'L - n <« 


Under the assumptions of Eq. (6), the system matrix [A\ is also 
symmetric and can be written in the form 
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The sparse descriptions of any symmetric system matrix | A | (see 


Eq..?)] is: 

fully described by the four one-dimensional 

vectors 

!/A(.Vi| = 

14.2.3. 2. I. 1 . 0 . 0) r 


{JAiM)\ : 

= (2. 4. 5. 7. 3. 5. 5. 6 . 8 . 5. 7. 6 . S) r 
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B. Application of Boundary Conditions 

In most engineering applications, the field variable at several 
boundary nodes may require constraints to satisfy a Dinchiet bound- 
ary condition of the form 


{*)/ =di (ID 

where d f is the specified value of the field variable at node I . 
Dinchiet boundary’ conditions may be applied at the element or 
system level. The impact of applying Dirichiet boundary conditions 
on the system matrix equation is identical whether applied at the 
element or system level. We will show the relatively easy process 
of applying Dirichiet boundary conditions at the element level and 
their impact on the system matrix equation [Eq. (2)]. 

The process for inserting the Dirichiet boundary condition. 
(O)/ is as follows: 

\ ) The coiumn of [A l<M ] corresponding to the Ah DOF is multi- 
plied by d,, and the result is subtracted from [P' 1 }. 
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2) The column corresponding to the /th DOF in IA { *'] is made 
^zero. 

. 3) The; row corresponding to the /th DOF in [A iel ] is made zero. 

4) The modified element matrix and the modified element load 
vector are assembled. 

5) [A)u is made equal to unity, and {F}, is made equal to d } . 
Thus, applying Dirichlet boundary conditions to the system ma- 
trix equation modifies Eq. (2 ) to 

M](<D) = {F) 

The numerical values of the coefficients in the modified system 
matrix 1.4] remain unchanged from those in 1 A), except for a few 
that are made zero during the application of the Dirichlet boundary 
conditions. Therefore, we w ill illustrate application of the assembly 
algorithms to the nonzero pattern of [A]. 

C. Sparse Assembly Algorithms 

Three symmetric sparse assembly algorithms will be explained 
in this section. The purpose of each assembly algorithm is to gen- 
erate the system loads vector {F} and the four vectors denned by 
Eqs. ( S— 10). which correspond to [A). The assembly algorithms 
are discussed starting w uh the simplest and proceeding to the most 
complex. 

; Aii'onthm I 

The mam ideas of this algorithm can be summarized by the fol- 
low mg computational tasks: 

1 5 Find how manv and which elements are connected to each 
DOF. 

a i Input data for ,Y..V£..\7\ and the elements connectivity isee 
Fig. 1). 

hi Compute the number of elements associated with each DOF. 
and store this information into the one-dimensional integer \ector 
|A/F<Y)|. 

c> Find the element numbers associated w ith each DOF. and 
store this information into the two-dimensional integer matrix 
;UA/(.\..V/F)|. 

2 1 Retrieve the stiffness matrix attached to each DOF. perform 
sparse matrix assembly one row at a time, and extract the four one- 
dimensional vectors required for the sparse equation solver. 


r M^nruhni 2 

The nonzero patterns of the symmetrical matrix \A] (see Eq. i7)J 
lor the two-dimensional truss example problem (Fig. 1 ) can be com- 
pletely described by the two one-dimensional integer vectors 

;/A*i M )) = 1. 1.4.4. 1. 1.2. 1. 5. 4. 2. ?. 2. 3. 5. 6. 3. 3. S. 6J r 

(13) 

!iC< A/)| = (7.7. 1.7. 4. 4. 2. 2. 5. 5. 5. 3. 3. 5. 5. 6. 6.6. 8. S.8j r 

(14) 

and the following integer matrix: 



The two one-dimensional integer vectors in Eqs. (13) and ( 14) are 
constructed by cycling through each element (e ) in increasing order 
and then determining the row' and column index of each nonzero 
coefficient from the connectivity array for each element (Fig. 1). 
Note that the matrix [HA] contains locations (or pointers) that are 
used to refer to vectors {//?} and {/C}. For example, the values 
of [//A] 4 , =4, [HA] 42 = 5. and [HA] a3 = 1 1 indicate that row 4 of £ 


mairix [A] will have these nonzero terms. The exact locations (row 
and column numbers » of those three nonzero terms in [A] can be 
referred to as 


664 
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Thus, the three nonzero terms of the fourth row of [A] are located at 
row 4. column 7: row 4. column 4; and row’ 4. column 5. respectively 
[see Eq. (7)). 

The integer matrix [HA] and system matrix [A] can be alterna- 
tively stored as one-dimensional vectors: 


\ha<\1)\ = (2.3.6. '.9.8. 12. 14. 13. 15. 18. 19. 

4.5. 11. 10. 16. 17.21. 1.20} r <]7 

|a(A/)) = {A\V. A ''\ . .4; 2 . A\:. Ay.. A'*, 4‘g. A™. A'[. 

At*- .A'. 1 /', a 1 '. . A\'\ 4'.g. A'?,. Ay. A\'"\ 4kk. A„. 4-)' 


The main ideas of algorithm 2 can be summarized by the following 
computational tasks: 

1 ) After initializing \fw} to a zero vector, process all elements un 
ascending order) to obtain the integer vectors {1R} and (7C1 while 
assembling [A] into ,j). 

2) Separate the diagonal and nonzero off-diagonal terms of j A 
from {a) and store this information in (AD) and (AA r ). Separate th 
diagonal and off-diagonal terms in {//?) and {7C}. and compute \JA 
and {/A j. 


3. Ali?onriim 3 

The nonzero palters of the symmetrical system matnx [A] can 
be completely described by {JA) and the follow ing one-dimensional 
integer vector: 

(/O.V-s-l ; = (1.5.7. 10. 12. 13. 14. 14. 14) r (|9> 


;/4|, = (/Cl,.,-(/C|, (20 

and the variable \ ] can be conveniently computed as 

M = |/CK.,-{/C}, 

The element connectivity information for the two-dimensional 
truss sample problem (Fig. 1) is full) described by the eiemeni- 
DOF matrix 

*10 0 0 0 0 10* 

0 0 0 10 0 10 

10 0 10 0 0 0 

110 0 0 0 0 0 

10 0 0 10 0 0 


[£(.V£. , Y i ] : 


0 0 0 1 10 0 0 

0 1 10 0 0 0 0 

0 10 0 10 0 0 
0 0 I 0 1 0 0 0 

0 0 0 0 1 10 0 

0 0 10 0 1 0 0 

0 0 10 0 0 0 1 

0 0 0 0 0 10 1 


*ach row of [£] contains exactly two nonzeros because each element 
tas two points of connection, or nodes, to the structure. Thus, [£]// i* 
lonzero only if node J is a node for element / . For example, the first 
ow of [£] contains a unit value only in columns 1 and 7. indicating 
hat the first element of the truss is connected to nodes 1 and 7 only 



(rig. l i ne concept oi an eismenM^ur matrix is easily extended 
to q DOF per node by extending each of the unity* coefficients in 
[£] to a q x q identity matrix. 

• To minimize the RAM, it is convenient to describe the element- 
DOF matrix [£] by the two one-dimensional vectors 

[IE(NE 4 1)} = {1, 3,5, 7, 9, 11, 13, 15, 17, 19,21.23, 25.27} r 

(23) 

[JE(NExNP)} = {7. 1.7. 4, 4. 1. 1,2. 1.5, 

4. 5. 2. 3. 5. 2. 5. 3. 5. 6. 6. 3. 3. 8, 6. 8} r (24) 

and the transpose of the element-DOF matrix <[£] r ) by the follow- 
ing two one-dimensional vectors: 

{IET(X 4- 1)J = {1.5.8. 12. 15. 20. 23. 25. 27} r (251 

{JETiXE X NP\) = {1.3. 4.5. 4. 7, 8.7,9. 11. 12.2.3. 

6.5.6. 8. 9. 10. 10. 11. 13. 1.2. 12. 13) r (26) 

The main ideas of algorithm 3 can be summarized by the following 
computational tasks: 

1 ) Assume that [/£). {/£}, {/£TJ, and {JET) have already been 
defined from the connectivity information (see Fig. 1 ). 

a) Compute j/C} and {/4} (symbolic assembly phase!. 

b) Compute {74} from Eq. (20). 

2) Assume that vectors [IA \ and \JA) have already been dehned 
from the symbolic assembly (task 1 ). Compute [AN] and {AD} from 
[A' v '\ (numerical assembly phase!. 

Ill, Sparse Algorithms for Solving 
Symmetrical Equations 

In this section, the major tasks involved in solving sparse svs- 
:ems of linear equations are briefly explained. The success of the 
sparse solver is due to improved technologies u.e.. equation re- 
ordering, matrix decomposition, supemodes and k>op unrolling, tor- 
ward backward solution phases) and bookkeeping strategies ideal 
for implementation on a digital computer. More detailed information 
on improved technologies can be obtained from Refs. 2-14. 


A. Sparse Reordering Alfiorithms 

After imposing the boundary conditions, the modified stiffness 
matrix |.4| can be obtained from [.4| as indicated in the discussions 
efore Eq. ( 12). Equation ( 12) should never be solved directly To 


funner simplify the discussions, we 
the following numerical values: 


110 7 

7 112 



5 0 

3 0 


sill assume that matrix [ A 1 has 

4 0 5 3 ' 

0 2 0 0 

66 0 0 0 

(27) 

0 11 1 0 

0 1 88 0 

0 0 0 44 


Thus, in this case A = 6 and V 1 = 6. Dunng the factorization phase, 
many of the zero- value terms appearing in Eq. (27 ) may become 
nonzero. For maximum efficiency of storage and solution time, the 
equations are reordered so that the number of nonzero terms that 
occur dunng factorization are minimized. These extra nonzero terms 
created dunng the factonzation of [A] are referred to as till ins and 
are denoted by the symbols F in the following equation: 


[A F (\ J . X)] 


X X X 0 X X 

X F X F F 

X F F F 

X X F 
X F 
X 


(2$ l 


in Lq. (28), onelus eight extra (or new; nonzero nu ms. Asa result, 

NF = 8 (29) 

N2 = AH 4 NF = 648 = 14 (30) 

In general, the number of nonzero coefficients in the upper triangular 
pan of [A] after factorization (N 2) is much larger than those before 
factorization (AM). 

The purpose of reordenng algonthms [multiple minimum degrees 
(MMD), nested dissection, or METIS algorithms) is to rearrange 
the nonzero terms of [A], defined in Eq. (27), to different locations 
so that ;V 2 is minimized. 5 17 “ :: ' For example, applying the MMD 
reordenng algorithm to [A] will result in the following permutation 
and inverse permutation sectors: 

{/P(.V)J = {5.6.3. 1, 4. 2} r . \l\'(N)\ = {4. 6. 3. 5. 1.2) r 

(31 i 


With the permutation array {IP}, the matrix [A] in Eq. (27) can be 
transformed into 


[AtfGV..VV) 


11 0 0 1 0 2 

0 44 0 0 3 0 

0 0 66 0 4 0 

1 0 0 88 5 0 

0 3 4 5 110 7 

2 0 0 0 7 112 


(32) 


Now. if one factorizes [A*], there will be only one fill in that 
occurs, as follows: 


| A/i/:{ A. A)] 


"X 0 0X0 X* 

X 0 0 X 0 

X 0 X 0 

X Y F 

X Y 
X 


(33 > 


B. Sparse Symbolic Factorization 

The reordered matnx [ A* J can be described by the following four 
one-dimensional vectors: 


[IA(S - li) = {1.3. 4. 5. 6. 7. 7| r 


{.Ml.Yl.i = (4. 6. 5. 5. 5. 6}' 1 34 > 


{AD(X)) = {11.44.66.88. 110. 112)' 

{AA'f.Y 1 ij = {1.2. 3.4.5. 7| r <35> 

In this example. .V =6 and .V 1 = 6. Before performing the numer- 
ical factorization, it is necessary to go through the sparse symbolic 
factonzation. so that the following hold true: 

1) The nonzero pattern of [A/?y] can be determined (including 
the locations of fill ins ). 

2) The value of N2 can be determined so that adequate com- 
puter memory can be allocated for the subsequent sparse numerical 
factorization phase. 

On completion of the sparse symbolic factonzation phase, the 
nonzero patterns of [ Arf] are completely known, and the modified 
versions of Eqs. (34) and i?5) for the factored matnx [ A/?f j can be 
computed as 


{/A( /V -r 

1)1 = (1.3. 4.5. 7.8. S!' 


{/A(A f 2 

)! = 14. 6.5. 5.5.6. 6| r 

(36i 

N 2 = 

V 1 - ,V£ = 6 + 1 = 7 

(37) 
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Efficient sparse symbolic factonzation algonthms and detailed 
FORTRAN coding can be found elsewhere. 1 


C. Finding Supemodes 

To understand the concept of a supemode tor master node), notice 
that, in Eq. (33). rows 2-3 and 4-5 have the same nonzero patterns. 
That is. the nonzero terms in rows 2-3 correspond to the same 
column numbers. Equation (33) can be used to define a master DOF 
vector 

[MS(N)} = (1.2. 0,2.0. 1} T (38) 

The master DOF vector {MS} is based on the assumed system matrix 
[ Arf ] defined in Eq. (33). Once Eq. (3S i has been defined, effective 
loop-unrolling techniques 2 23 can be used to improve computational 
speed during the sparse numerical facionzation phase. 

D. Sparse Numerical Factorization Phase 

The strategies employed in this phase are quite similar to the ones 
used during the sparse symbolic facionzation phase and have been 
well documented in the literature. 5 - 24 The reordered system matrix 
[ A H } can be decomposed or factorized as 

(^] = [ar][n r (39i 




Source none 
pressure. p t 


Fig. 2 I nree-dimerisioaal duct and coordinate system. 

A. Mathematical Formulation 

The mathematical formulation of the duct acoustics problem 
(Fig. 2) does not lead to a boundary value problem that is for- 
malh self-adjoint and will not lead to a symmetric system ma- 
trix when airflow is considered. Thus, the analysis in the foregoing 
discussion does not allow for airflow because the current paper fo- 
cuses on symmetric systems. With zero airflow in the duct = 0 j_ 
the mathematical problem is to find the solution to Helmholtz'^ 
equation 15 

V 2 p + k~p = 0 (44) 

Along the source plane of the duct t: = 0». the boundary condition 
is given in term of a Dirichiet boundary condition: 

p ~ p, (45i 

The wall boundary condition is 


E. Solution to the System Matrix Equation 

The solution to the system matrix equation [Eq. (12)] is obtained 
in three phases: 

1) In the first phase (forward solution phase), an intermediate 
solution vector \Q>ff I is computed from the solution of the matrix 
equation 


= — IK — (-40 f 

dn 

At the duct termination iz — L ). the ratio of acoustic pressure to the 
axial component of acoustic panicle velocity is proponionai to the 
known dimensionless exit impedance. When expressed in terms of 
the acoustic pressure, this boundary condition is 


[£}\<*>Ff} = \Fh) (42) 

2) In the second phase (backward solution phase), a vector {$bs} 
is computed from the matnx equation 

mC] T {*BB) = 14 >ff) (43) 

3) In the third phase (back transformation phase ). the vector ( <&bb } 
is transformed back to the onginai unknown vector {<J>} by utilizing 
the inverse permutation vector {/V}. 

IV. Three-Dimensional Aeroacoustics Application 

The developed algorithm will be exercised to study the propa- 
gation of acoustic pressure waves in a three-dimensional duct lined 
with sound absorbing materials ( acoustic liners ) as depicted in Fig. 2. 
The duct is spanned by axial coordinate r. transverse coordinate .v, 
and span wise coordinate y . The source plane is located at r = 0. and 
the source plane acoustic pressure p , is assumed known. At the exit 
plane, the dimensionless exit acoustic impedance £«» is assumed 
known. In the duct, air is flowing-along the positive c axis at a sub- 
sonic speed of w 0 , and the duct has acoustic liners along its upper, 
lower, and two sidewalls. The duct walls are assumed to be locally 
reacting so that the sound absorbing properties of the acoustic liners 
results from the dimensionless wall impedance £ that is assumed 
known. The sound source pressure, dimensionless exit impedance, 
and dimensionless wall impedance are assumed functions of posi- 
tion along their respective boundaries. 


OR Sexit 

Equations (44-47) form a well-posed boundary value problem 
for which exact solutions for the acoustic pressure field are gener- 
ally not known. A solution for the acoustic pressure field satisfying 
this boundary value problem is required to predict and reduce the 
radiated noise. An approximate solution for the acoustic pressure 
field can be obtained using numerical techniques such as the finite 
element method. 

B. Finite Element Model 

The approximate solution for the sound field in the duct is ob- 
tained by subdividing the duct and representing the acoustic field 
within each subdivision by relatively simple functions. Because the 
duct of interest is a rectangular prism, the computational domain is 
divided into a number of smaller rectangular prisms (or elements) 
as shown in Fig. 3. These elements are considered interconnected at 
joints called nodes. The most widely used method for locating the 
nodes in the discretization is to divide the physical volume into NX , 
and NZ grid lines in the a\ y . and r directions, respectively, as 
shown in Fig. 3. Each node of an element can be located by iden- 
tifying an ordered triplet. >v, z k ). Similarly, each element in 
the assemblage can be identified by an ordered triplet of integers 
(/, J. K). A typical rectangular pnsm element (/, y, K) is shown 
in Fig. 4. Each element consists of eight local node numbers labeled 




Fig. 3 Three-dimensional finite element discretization. 



The linear combination JEq. (49)] comprises a complete set of basis 
functions. 

For a typical element (/, /. K ). contributions to the minimization 
of the held error function due to local node m over the computational 
volume V are 

J E r N m dV = j^lV : p + k : p]N m dV (51) 

The second derivative terms in Eq. (51) are reduced to first deriva- 
tives using Green's second identity 


dV = JT [-{ V}p • { V}W m -j- k z pN m ] dV' - j d5 


A 



Fig. 4 Typical three-dimensional dement and local node numbering 
system. 


i. 2 8. Each element is considered to have a dimension of h . 

and / in the a . y. and : directions, respectively, as shown. 

C. Element Stillness Matrix 

GalerkinY finite element method is used to compute the element 
stiffness matrix. The neld error function is defined as 

£ = V~{) -f k~ p (48) 

Within each element, p is represented as a linear combination of 
eight functions. .Y ; . .Y : 


ni = s 

p = y"! V (49) 

m = 1 


Elimination of the second derivative terms from the volume inte 
gral in Eq. (51) is required so that the linear basis functions N m 
can be used. Elimination of the second derivative terms from the 
volume integral also has the advantage that ail impedance boundary 
conditions can be incorporated into the surface integral of Eq. (52). 
This allows a choice of basis functions that do not have to satisfy 
explicitly any impedance boundary conditions. The contribution to 
the surface integral 


L 


dp 

— N~ d5 

tin 


(53) 


is identically zero for all elements except those that lie along 
an impedance boundary'. Substituting ihe exit boundary condition 
[Eq. i-Tij mto the surface integral in Eq. t53 > gives 



[%N. 

„ dS = -ik 

P 

-r~: \\ dS 


(54) 


JS l * n 

Js 



along 

the exit boundary. 

whereas for elements that 

lie aloiii 

: the 

upper. 

lower, and sidewalls of the duct 





f — S 

dS = -ik j 

f - V„, dS 


(55) 


Js < ln 

J 

.t c 




The contribution to the minimization of the field error for each 
element, when collected for each of the eight local nodes m , is 
expressed in matrix form as 


/ 

/ 

/ 


E,N,dV 


E r N Z iiV 


ErNtdV 


= [A i J «']{<*> ' J « 


(56 1 


V; = (' _ s)(t)(' _ ?) "’“T. 

•'■‘■(iK'-jK 1 -?) 


In Eq. 56). JK] ) is an 8 x 1 column vector for each element 
containing the unknown acoustic pressures at the eight loeui nodes 
of the element 

{ c t > ; ' c ’)‘ = {p\- Pz- py- Pi- Pz- Pf p-. P»\ (57) 

The element matrix [4 {/ y A l ] is an 8 x 8 complex symmetric 
matrix for each element (/, J . K). In the special case of a hard wall 
duct i ; — oc ). 




[V). 

m+iB]. 


K = I.VZ - I)| 
K = tSZ - 1 ) j 


3 


( 58 ) 


Here. \V] represents the contribution to due to the element 

volume \ T . whereas f B ] represents the contributions due to the exit 
plane boundan The matrices [P] and [5] are symmetric, and their 
coefficients ha\e been computed explicitly: 


k 2 whl 
VP] = 

11 ->th 


4 : i 


vol hi wh 

sbh 36 u 36 / 


4 2 14 8 


f? — fiexix VX f . Xj ) -j- ficxil (' / . }'j — J ) 

Arxit (A / -► 1 « }'j + 1 ^ 4" (*V/ • Vy — I ) 

/a — (A / . \'J ) + (.Y / . Vy -r 1 ) 

~ A:ui<-V/ * | . Vy * I ) + 3^ CXI , (A / . Vy * 1 ) 
./? = 3/i exit (■V/. >v) + 9/J exit (-V/. \ / _ i ) 

— 3 A rx:ti-V/ _ I . Vy _ J ) 4 &\nU/ . > 7 - i ) 

= ftrxuLl / . Vy ) 4- 3/*e\,i(-V/ . Vy j ) 

— 3/v v ,U/ * i , V/ | ) 4* A?*u(-V/ . Vy . j ) 
/- = ftx.tU/ . Vy ) 4- 3/* eMl (.Y, . Vy - j ) 


2 4 4 4 4 8 


“ 9ft. XI ,u i - 1 . y./ ♦ i ) 4- L'XIl (.V/.Vy.,) 


4 2-2-4 2 1-1 

2 4 -4 -2 1 2-2 

-2 -4 4 2 -1-2 2 

-4-2 2 4 -2 -1 1 


1 2 - 2-1 
- 1-2 2 1 


4 2-2-4 

2 4-4-2 


J ' = Vy ) 4“ Pcmi (A/. Vy * j ) 

— 3[0 ovl ,(V/ ^ 1 , Vy * j) 4- ft exit 1 A / . X j _ ] ) ] 
= 3/J CX |,(A / . Vy ) 4- ftrxitlA / . Vy _ | ) 

- 3/J, KlT l.Y/ . j. Vy - | ) 4 9/)*, xlI (A/ . Vy _ , ) 


in which 


2 -4 


4 -4 -2 2 2 -2 -1 1 


"i _ 


2 2 4 -4 -1 1 2 

2-2-4 4 1-1-2 


4 -4 


-1 -4 


_ *> "> -> 


4 -4 


-4 -2 -1 -2 

-2 -4 -2 -1 

-I -2 -4 -2 


-2 -4 


-4 -2 -1 -2 4 

-2 -4 -2 -I 2 

-2 — \ -2 l 


0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 


151= -K 


0 0 0 0 0 0 0 0 

0 0 0 0 j\ J: j\ u 

0 0 0 0 /: U ft h 

o o o o h ft fi h 

0 0 0 0 U h h h 


V. Results and Discussion 

The three-dimensional rigid wall acoustic element has been cou- 
pled with the sparse assembh and equation solver algorithms to 
pro\ ide assembly and solver statistics for a three-dimensional duct 
aeroacoustics application. Computations presented in this paper 
were run on a single processor with double-precision < 64-bit i arith- 
metic on an ORIGIN 2000 computer platform. The sparse equation 
sober used MN1D reordering. Computations are presented for a 
uniform grid and a geometry identical to that of the Langley Flow 
Impedance Tube. This three-dimensional duct has a square cross 
section 0.050S m in width (U = H =0.0508 mi and 0.812 m in 
length [L =0.812 m). A more detailed description of the duct is 
given in Ref. 15. All calculations were performed ai standard at- 
mospheric conditions without flow, and the source frequency was 
chosen to span the full range of frequencies currently of interest in 
duct liner research. The sound was chosen as a plane wave ( p, = 1 ). 
and the dimensionless exit impedance was chosen as unity ( £ exj , = 1 h 
This exit impedance will simulate a nonreflecting termination for 
the plane wave source. 

Table 1 presents CPU statistics ( in seconds ) for each of the three 
assembly algorithms and the sparse equation solver as a function of 
the source frequency /. in kilohertz. The CPU time for ihe solver 
t column 9 1 is that required to obtain the solution vector after the sys- 
tem matrix was assembled. Note that before obtaining the solution 
vector, the system matrices obtained from each assembly algorithm 
were compared to each other. Each assembly algorithm assembled 
the identical system matrix as expected. Also included in Table 1 are 
the number of grid lines NX. NY. and NZ and the matrix order .V that 
w ere used to perform the computations at each frequency. Here we 
have used the generally accepted rule that 12 points per wavelength 
is required to resolve a cut-on mode in each coordinate direction. To 
establish the accuracy of the solver solutions, the relative error norm 
iRelern. computed from the solver solution vector, was tabulated in 
the final column of Table 1. The relative error norm* is defined as 


f\ — 9 /? exiI ( A , Vy ) 3 / 3 ex „(A/ , Vy + j ) 

4- ficxv. [ - X - i ♦ }'J - 1 ) + 3 Arx,, (a / , Vy + \ ) 
fz = 3/? exi! (A . Vy ) - 1 - ftcxn Li / , Vy + j)] 


Reierr = 


[EN r X (£N} r 

{E] m x \F] T 


yy^i 


) * ficxu (A/ . Vy 1 ) 


{HV} = [yi]{4>}-{F} 


/ 

sx 

vr 

NZ 

N 

Algorithm i 

Algorithm 2 

Algorithm 3 

Solver 

Relerr 

4.00 

6 

6 

114 

4,104 

4920 

034 

0.22 

6.00 

6.5 x 10-’ 5 

7.00 

12 

12 

200 

28.800 

106.80 

230 

1.75 

22.80 

1.8 x 10 -12 

11.00 

18 

18 

313 

101.412 

U23.80 

9.05 

228 

487.80 

7.5 x 10- 12 

14.00 

24 

24 

399 

229.824 

5.520.60 

20.74 

14.24 

3.120.00 

3.4 x lO -11 

17.00 

30 

30 

484 

435.600 

19.488.00 

39.78 

26.94 

10.440.00 

3.2 x 10- 11 

21.00 

36 

36 

599 

776.304 

N/A 

73.81 

48.35 

N/A 

N/A 


Table 2 RAM statistics (in megabytes) for the sparse algorithms 


/ 

N 

.VI 

Algorithm 2 

Algorithm 3 

Solver 

4.00 

4.104 

41.468 

0.47 

0.46 

4.00 

7.00 

28.800 

331.244 

21.00 

11.00 

80.00 

11.00 

101.412 

1.216.118 

72.00 

37.00 

640.00 

14.00 

229.824 

2.812.838 

165.00 

83.00 

2140.00 

17.00 

435.600 

5.396.600 

3 i 7.00 

158.00 

8.100.00 

21.00 

776.304 

9.696.158 

551.00 

283.00 

N/A 


Tabular results at 21 kHz are not presented for assembly algorithm 1 
and the sparse equation solver because of the excessive CPU time 
required by these two algorithms. 

Although algorithm I is extremely simple, its performance is 
extremely slow (Tubie i ). Note that algorithm 1 is 145 times slower 
than the other two algorithms at a frequency of 4 kHz and more 
than 490 times slower at 17 kHz. Tabular results also show that the 
CPU time required to assemble the system matrix using algorithm 1 
exceeds that required to obtain the solution vector by 904K s (or 
S7G ) at 17 kHz. At low frequencies, algorithm 2 is only slightly 
slower than algorithm 3. but as the frequency increases to 17 kHz. 
algorithm 3 is 32G faster than algorithm 2. Generally, the higher 
the frequency, the better the performance of algorithm 3. relative to 
that of algorithm 2. Furthermore, in using algorithm 2. the user has 
to guess the maximum number of nonzero terms per row i \/Z) to 
allocate the RAM for the matrix |//A 1. Also, the CPU times required 
to assemble the system matrix using algorithm 2 or algorithm 3 are 
both more than two orders of magnitude less than the time required 
to obtain the solution vector. Finally. Relerr is small, indicating that 
the solver solution is accurate. 

Table 2 shows the RAM (in megabytes) for algorithm 2. algo- 
rithm 3. and the sparse equation solver. RAM statistics for algo- 
rithm 1 were not tabulated because its performance was extremely 
slow when compared to algorithm 2 and algorithm 3 (as shown in 
Table 1 ). Values of the variables ;V and .V 1 are also given in Table 2. 
The results show that the number of off-diagonal nonzero coeffi- 
cients (V l ) is an order of magnitude larger than .V. Table 2 also 
shows that algorithm 3 requires less memory than algorithm 2 be- 
cause algorithm 2 must allocate RAM for storing vectors \IR). |JC). 
and [//A] [see Eqs. < 13-15)]. Note also that memory required by the 
sparse equation solver is substantially larger than that required for 
assembly algorithm 2 or algorithm 3. This is further verification that 
most of the RAM allocated is used during matrix factorization. Pre- 
liminary results from tests conducted by the authors have suggested 
that the performance of the sparse equation solver may improve if 
the solver were to use METIS instead of MMD reordering. Forex- 
ample. at 7 kHz the number of nonzeros after factorization i ,V2 1 was 
reduced from 4.736.991 with MMD reordering to only 4.3" , 6.496 
when the METIS reordering algorithm was used. 

VI. Conclusions 

A template for symmetric sparse equation assembly and solutions 
on an unstructured grid has been presented. The accuracy and nu- 
merical performance of the sparse algorithms have been evaluated 
over the frequency range of interest in a three-dimensional aeroa- 
coustics application. Based on the results of this study, the following 
conclusions are drawn: 

1 ) Assembly aigonrhm 1 is impractical for sysiem matrix assem- 
bly at high values of source frequency. It requires up to 87 r r more 
CPU time to assemble the system matrix than the sparse equation 
solver requires to obtain the solution vector. 


2) Assembly algorithms 2 and 3 have nearly equal performances 
at low values of source frequency, but algorithm 3 gives savings 
in both CPU time (32 Cc) and RAM (507c) at the higher values of 
source frequency. 

3) Error norm statistics show’ that the sparse equation solver com- 
putes accurate acoustic solutions over the frequency range of interest 
for the three-dimensionai aeroacoustics application. 

4) At high frequency ( 17 kHz 1. the sparse equation solver requires 
low memory, but requires significant speed-up before optimization 
studies (either of the duct geometry or liner material properties) are 
practical. This research supports a recommendation. Therefore, that 
a parallel version of the sparse solver be developed. The CPU time 
and RAM required by assembly algorithms 2 and 3 are two orders of 
magnitude smaller than that required by the sparse equaiion solver. 
These algorithms can. therefore, be conveniently incorporated into 
a substructunng lor domain decomposition) formulation (provided 
that each substructure is handled by different processors) to take 
advantage of parallel computation to further reduce CPU time and 
RAM. ” 
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Ins text is a practical guide to 
. buiuimg Kalman filters and 
show s how the filtering equations 
can be applied to real-life problems. 
Numerous examples are presented in 
detail show ing the mans wavs in 
which Kaiman filters can be 
designed Computer code written in 
FORTRAN, MATLAB, and True 
BASIC accompanies all of the exam- 
ples so that the interested reader can 
verify concepts and explore issues 
beyond the scope of the text. 
Sometimes mistakes are introduced 
intentionally to the initial filter 
designs to show the reader what hap- 
pens when the filter is not working 



properly. The text spends a great 
deal of time setting up a problem 
before the Kalman filter is actually 
formulated to give the reader an 
intuitive feel for the problem being 
addressed. Real problems are sel- 
dom presented in the form of dif- 
ferential equations and they usually 
don't have unique solutions. 


Therefore, the authors illustrate several 
different filtering approaches for tackling 
a problem Readers will gain experience in 
software and performance tradeoffs for 
determining the best filtering approach 
for the application at hand. 
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3. Parallel Finite Element Domain Decomposition for Structural/ 
Acoustic Analysis: Symmetrical Case [Ref. 6.2] 

(Please refer to the attached Journal Paper for this section) 
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Abstract. A domain decomposition (DD) formulation for solving sparse linear systems of 
equations resulting from finite element analysis is presented. The formulation incorporates 
mixed direct and iterative equation solving strategies and other novel algorithmic ideas that 
are optimized to take advantage of sparsity and exploit modern computer architecture, such 
as memory and parallel computing. The most time consuming part of the formulation is 
identified and the critical roles of direct sparse and iterative solvers within the framework 
of the formulation are discussed. Experiments on several computer platforms using real 
and complex test matrices are conducted using software based on the formulation. Small- 
scale structural examples are used to validate the steps in the formulation and large-scale 
(1,000,000+ unknowns) duct acoustic examples arc used to evaluate the parallel performance 
of the formulation. Results are presented using 64 SUN 10000. S SGI ORIGIN 2000 proces- 
sors, and a cluster of 6 PCs (running under the Windows environment Statistics show that 
the formulation is efficient in both sequential and parallel computing environments and that 
the formulation is significantly faster and consumes less memory than that based on one of 
the best available commercialized parallel sparse solvers. 


Mathematical Subject Classification : 74S05.15A06, 
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1. Domain decomposition (DD) formulation for finite element analyses 

Application of finite element analysis to engineering problems leads to the discrete 
equation system [1, 2] 

m = (5} . (i.i) 
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where S, Z are vectors of length A T that contains the known nodal loads and un- 
known nodal quantities, respectively. Here K is a complex, nonsingular, svmmet- 
ric/unsymmetric, NxN sparse matrix. Although (1.1) assumes a single loading con- 
dition (i.e., right hand side), multiple loading conditions may be treated by taking 5 
and Z as dense matrices, so that the j ih column of Z corresponds to the N unknown 
nodal quantities associated with the loadings in the j th column c: 5- 

Using the domain decomposition concept, (1.1) is written in partitioned form 

[&*:]{*}-{.*} <“> 
where submatrices Kjb*Kbi*Kii and I\bb have dimension nix::, nxm. mxm and 
nxn, respectively. The interior and boundary' unknowns (i.e.. Z- and Zb) have di- 
mensions compatible with the columns in Ku and Kbb, respectively. 

Eliminating the interior unknowns from (1.2) gives 


KbZb = Fjb , (1.3) 

where 

Kb = K B b + K bj Q , (l* 4 ) 

Q = -KJj'Kib , (1.5) 

F B = S B + Q , (1.6) 

Q = -A'b/S/ , (1.7) 

5/ = A7/S/ . (1.8) 


Here Kb is the boundary 7 stiffness matrix for the domain. Fq is the vector of boundary 
forces, and the superscript (i.e.. — 1) denotes the matrix inverse. Efficient sparse 
algorithm [3]-[ll] may be used to decompose sparse matrix A;; and solve for matrix 
Q in (1.5) and the vector Sj in (1.8). 

In the current DD formulation [12, 13] the computational domain is decomposed 
into L subdomains and Kb and Fb are synthesized by considering contributions 
from all subdomains. For this purpose, the discrete equation system for a subdomain 
(which is considered an isolated free-bodv) is expressed in the form 1.2 : 



i 1 

j / 4 r) 1 f 4 r) l r = 

KV\\zy) { Sj r) J ’ 

•I (1.9) 

where r refers to the 

r th subdomain. Let and represen: 

the number of 

boundary 

and interior unknowns of the r th subdomain, respectively 

. The solution 

of (1.9) is 


r^( r ) |7( r ) p( r ) 

A B "B ~ r B » 

(1.10) 



Z { d = - K^ZV) . 

(1.11) 

where 


4 r) = <> • 

(1-12) 



Q (T) = , 

(1.13) 


15 




• Parallel FE domain decomposition for structural /acoustic analysts 191 

F { b = 4 r) + Q (T) - (1.14) 

Q {T) = -Al r ]sj r) , (1-15) 

Sj r) = . (U6) 

Finally, Kb and Fb may be obtained explicitly from the equations 

K B = Y,^ (t) ) Tk b ) 0 {t) . F B = S B -r y 3(/? <r) ) r ((? ,r ') T 5j r) , (1.17) 

where is a Boolean transformation matrix of dimension ti ! ' xn^ r ' . 


The sequence of steps constituting the DD formulation proposed in this paper is 
as follows: 

(1) Decompose the large-scale finite element domain into L smaller subdomains. 
Algorithms and software given in [12. 13] are used for this purpose. 

(2) Compute ^bb- * and S B * us* 11 © efficient sparse assembly 

algorithms [3, 11]. 

(3) Factorize the sparse matrix and compute Sj 7 ^ using (1.16) and 
using (1.13). Algorithms and software for sparse symbolic and numerical 
factorization, loop unrolling techniques, equation reordering, and forward- 
backward solution phases ([3]-[ll]) are utilized at this step. 

(4) Compute and for each subdomain. 

(5) Compute Kb and F B from (LIT). 

(6) Solve (1.3) using a direct or iterative solver to obtain the boundary unknowns, 

Zb- 

a) : Efficient parallel direct dense solvers given in [14 - [16] may be utilized 
at this step provided that A' B is formed explicitly. 

b) : However, explicit computation of Kb is an expensive operation due to 
the need to perform the inner produce K^}Q [r) in (1.12). 

c) : Iterative solvers (such as the preconditioned conjugate gradient algo- 
rithm) [17] are therefore recommended for this STep in the formulation. 
The use of an iterative solver eliminates the need to form I\ B explicitly 
because each stage of the iterative solution typically requires only the 
product a matrix (K B b + Kgj)Q {r) with a known vector. 

(7) Finally, the solution for the interior unknowns are obtained from (1.11) by 
using the factorized sparse matrix I\ during the forward and backward 
substitution phases. 

The solution vectors obtained from the formulation are post-processed to obtain other 
quantities of interest such as stresses, strains, acoustic energy, etc. The remaining sec- 
tions of this paper will focus on issues related to efficient sparse assembly procedures. 
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2. Simple finite element model 

To facilitate the discussion a simple finite element model which consists of 4 rectan- 
gular elements with 1 degree of freedom (dof) per node and its loading condition 



element in Figure 1 are given by the following “element-dof" connectivity matrix. E 


1234567S9 



The number of rows and columns in E correspond to the total number of finite ele- 
ments (4 finite elements) and degrees of freedom (9 dof). in the model. The following 
2 integer arrays describe the nonzero structure of £ in a row oriented format 

{IE} t = {1.5.9. 13. 17} . (2.2) 

{JE} T = {3, 8. 1,6. 7. 3, 2,4, 5. 2. 3. 6.7-9. S. 3} . (2.3) 

The array IE contains the starting location of the first nonzero terms for each row of 
matrix E while JE contains the global dof number associated with each e tr: =1.2. 3.4) 
rectangular element. This format is called compress row format. Similarly, the trans- 
pose of the matrix E can be described by two integer arrays. IET and JET. 

The “exact" numerical values for the 4x4 element stiffness matrix K Kt is 'unim- 
portant" at this stage of the discussions and are assumed to be given by the following 
formulas: 
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[i$r (3) ] = 
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(2.5) 


Note that the global row and column numbers for the e th rectangular element are 
easily obtained from JE. For example, the global row and column numbers 7.3.2, 
and 4 for rectangular element 2. are contained in the 5*6,7, and S tfl element of JE. 
Further, the “simulated” element stiffness matrices. K (e \ are unsymmetrical in value 
but symmetrical in locations. For example. has a nonzero term of 14 at location 
(3,4), and there is also a nonzero term of -14 at location (4.3). Following the usual 


neraem - nirju r r -g flutt n m iigunvi - 


where 


\K]{Z) = {S} , 


f^] = Ef A '] (e) - 

c= 1 


;. 2 . 6 ) 


(2.7) 


For example, + K ( 3 3) 3 + K { 3 ] 3 = 2 + S 4 18 -r 32 = 60 as indicated 

in (2.8) 
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. 2 . 8 ) 


{S} T = {4.44,10.-20.42,-40.84.28.48} , (2.9) 

where F represents fill-in terms (shown in the upper triangular portion of matrix I\ 
only) that occur during factorization of K. 
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The solution to equation (2.6) is: 

{Z} T * {1,1, 1,1, 1.1. 1.1.1} (2.10) 

which has been independently confirmed bv the results of the computer program 
developed. In the upper triangular portion of K. there are 9 fill-in terms. In order to 
minimiz e the number of fill-in terms during the symbolic and numerical factorization 
phases, reordering algorithms [10, 13] such as multiple minimum degree (MMD), 
Nested-dissection (ND), and METIS are used in the DD formulation. 


3. Symbolic sparse assembly for symmetrical and unsymmetrical matrices 

It is useful to understand the symbolic sparse assembly for “symmetrical** matrices 
[3, 11] before proceeding to the unsymmetrical case. Figure 2 gives a ‘ pseudo’* 

jp~i 

— » Do $ 01 * 1 . HM 1 <* K- 1 ) fix each P rw of [A] 

JP1* JP 

If (P nnv comesjxrod to b.c » Cm to 30 


Do 20 IP = cousuha alt tkiuctibs tfiadud to I* 1 toi deft 
“ * Do 1 0 KP - coupler all nodes ux riof ) 



; do an 3 13 cm r a eto mersts) 


{ asseublmfi. Lwti Tnauyic of [A], rim to SYM 


*-*!>© NOT wad to lutkulc Ok ck*l which afaea <1v accixutcd 
for mom oUiei ttcotcms) LA (K* * 1 

•Record H»e cohum «E ton ti* P row t wliu li ia> 

UHHXtn term JAfJPt * K 

‘Increase the counter IP by l «« vouipuiiiu: NCVEF1 * 
JP* JP+1 
10 Commie 


20 Ccutwne 


136 IAtl»* 3P1 


HCOEFi - JP 
IA(Nt-JP 
lAfN + l> = JP 

Figure 2. Pseudo Fortran codes for symmetrical symbolic sparse assembly 

Fortran coding of the symmetrical sparse assembly procedure. Only minor changes 
in this symmetrical assembly procedure are required to extend it to unsymmetrical 
matrices. 

In a symmetric matrix the “lower triangular" portion of K is identical with the 
“upper triangular portion." Thus, the upper triangular portion of K (neglecting fill-in 
terms (2.8)) can be represented in compressed row format by the following 2 integer 
arrays: 

{L4} t = {1,4,9,15,16,17.15.20.21.21} . (3.1) 
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{ JA} t = {3, 6. S, 3. 4, 5, 6, 7. 4, 5, 6, 7. 8. 9- 7, 6, S. S. 9. 9} , (3.2) 

where the array 1A contain the starting location of the first non-zero, off-diagonal 
term for each row of the upper triangular portion of K. The difference between any 2 
consecutive integers on the right-hand-side of (3.1) will give the number of non-zero 
(off-diagonal) terms in a particular row of the upper triangular portion of K. For 
example, IA(3)-IA(2) = 9-4 = 5. Hence, there are 5 non-zero terms (excluding the 
diagonal)) in row 2 of the upper triangular portion of matrix K. Additionally. J A 
contains the column numbers, associated with each non-zero, off-diagonal term for 
each row of the upper triangular portion of matrix K. Note that I A and J A arrays 
can also be obtained from the “pseudo" Fortran coding shown in Figure 2. 

The following minor changes in the coding of Figure 2 are required to perform 
unsvmmetrical assembly. 

a) : DO 30 1= 1, N (the last row will NOT be skipped) 

b) : Introduce a new integer array IAKEEP(N-bl) which plays the role of array 
IA(-), for example: IAKEEP(I)= JPI . 

c) : Remove the IF statement in Figure 2 that skips the lower triangular portion. 
As a consequence of this, the original array I A (-) will contain some additional, 
unwanted terms. 

~~~ 3):~Tfie dutjnirTroin - the ^isyrnmetricaT* sparse assembly will be stored by 
IAKEEP(-) and JA(-). instead of IA(-) and JA(-) as in the symmetrical case. 


4. Applications 

4.1. Software. The software that is based on the parallel DD formulation presented 
in this paper has been developed. The parallel algorithm uses the message passing 
interface (MPI) for interprocess communication and is therefore highly portable. The 
software developed is referred to as the direct iterative parallel sparse solver (DIPSS). 
DIPSS (in FORTRAN) incorporates a number of lower level routines and provides op- 
tions for both real and complex matrices in double precision (i.e.. 64-bit arithmetic). 
Results are presented for symmetric matrices only. We use sparse factorization tech- 
niques presented in [3] and implement the preconditioned conjugate gradient iterative 
solver [17] to solve the dense system (1.3). The following three examples are used to 
evaluate the proposed parallel DD formulation. Performance gains are particularly 
evident for large problems. 

4.2. Example 1- Mixed finite element types. This is a structural example for 
which the equation system is real and symmetric and has more than 1 finite element 
type. The entire finite element model is shown in Figure 3 and consists of 2-node 
“line" elements, 3-node ‘triangular * elements, and 4-node “rectangular" elements. 
Interior and boundary nodes are denoted by open and filled circles, respectively. This 
small-scale, finite element model is decomposed into 3 subdomains as indicated in 
Figure 4. The primary purpose of this example is to validate all intermediate and 
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Substructure 1 


Substructure 2 


Figure 4. Decomposition of mixed model into three subdomains 


final numerical results using the DIPSS software. This small-scale example was also 
solved in Matlab using separate software packages. Although numerical results are 
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not presented for the sake of brevity, excellent comparison between DIPSS and the 
Matlab solution was obtained for this small-scale example problem. 

4.3. Example 2 — Three dimensional structural bracket finite element 
model. The DD formulation has also been applied to solve the 3-D structural bracket 
problem shown in Figure 5. The finite element model contains 194.925 degrees of 
freedom (7V= 194.925) and the elements in the matrix, K. are real Results were 
computed on a cluster of 1-6 personal computers (PCs) running under the Windows 
environment with Intel Pentium IV processors. It should be noted that the DIPSS 
software was not ported to the PC cluster, but the DD formulation was programmed 
(from scratch, in C^) on the cluster. 



Figure 5. Finite element model for a three-dimensional structural bracket 
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The wallclock time (in seconds) to solve this example is documented in Table 1. A 
super linear speedup factor of 10.35 has been achieved when 6 processors were used. 


# of PC processors 

1 

2 

3 | 4 

5 

6 

Wallclock time (sec) 

2,670 

700 

435 405 306 

258 

Speedup factor 

1.00 

3.81 

6.14 6.59T S.73 

10.35 


Table 1: 3-D Structural bracket model (194.925 dofs. K real) 


4.3. Example 3 — Three dimensional acoustic finite element model. In this 
final example, DIPSS is exercised to study the propagation of plane acoustic pressure 
waves in a 3-D hard wall duct without end reflection and airflow*. 



Figure 6. Finite element model for a three-dimensional hard wall duct 


The duct is shown in Figure 6 and is modelled with brick elements. The source 
and exit planes are located at the left and right boundary, respectively. The ma- 
trix, K, contains complex coefficients and the dimension of A is determined by the 
product of NN. MM. and QQ (N=M M xNNxQQ ) . Results are presented for two 
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grids (A 7 =751,513 and A'— 1.004,400) and the finite element analysis procedure for 
generation of the complex stiffness matrix, AT, was presented in another paper [18]. 

DIPSS memory and wallclock statistics were also compared to those obtained using 
the platform specific SGI parallel sparse solver (i.e., ZPSLDLT). These statistics were 
computed on an SGI ORIGIN 2000 computer platform that was located at the NASA 
Langley Research Center. The SGI platform contained 10 gigabytes of memory and 
eight ORIGIN 2000 processors were used. It should be noted that ZPSLDLT is part 
of the SCSL library (version 1.4 or higher) and is considered to be one of the most 
efficient commercialized direct sparse solvers that is capable of performing comple x 
arithmetic . Due to the 3-D nature of hard wall duct example problem. A' encoun- 
ters many fill-in elements during the factorization phase. Thus, only the small grid 
(A'=751,513) could fit within the allocated memory on the ORIGIN 2000. ZPSLDLT 
required 6.5 wallclock hours to obtain the solution on the small grid whereas DIPSS 
wallclock was only 2.44 hour s. DIPSS also required nearly 1 gigabyte less memory 
than ZPSLDLT, and the DIPSS and ZPSLDLT solution vector (Z) were in excellent 
agreement. 


Because DIPSS uses MPI for interprocess communications, it can be ported to 
other computer platforms. To illustrate this point the DIPSS software was ported to 
the SUN 10000 platform at Old Dominion University and used to solve the large grid 
duct acoustic problem (A" =1.004.400). Wallclock statistics and speedup factors were 
obtained using as many as 64 SUN 10000 processors. Results are presented in Table 
? It, should be, noted that^a superlinear speedup factor of 85.95 has been achieved- 



1 

2 

4 

8 

16 

32 

64 

Assembly time (sec) 

19.38 

10.00 

5.08 

2.49 

1.26 

0.70 

0.27 

Factor time (sec) 

131.229 

58.976 

26.174 

10.273 

3.260 

909 

56 

Wallclock time (sec) 

131.846 

61.744 i 27.897 


■nEna 



• i Hi 

1.00 

2.14 

4.73 

11.22 

34.54 

67.03 

85.95 


Tabie 2: Statistics for 3-D Hard wall duct (A r = 1.004.400. K complex) 


when 64 SUN 10000 processors are used. This super-linear speedup factor is due to 
two primary reasons: 

(1) The large finite element model has been divided into 64 subdomains. Since 
each processor is assigned to a smaller subdomain, the number of operations 
performed by each processor has been greatly reduced. Note that the number 
of operations are proportional to (n^) 3 for the dense matrix, or n (r) -BW 1 2 
for the banded, sparse matrix, where BW represent the half Band Width of 
the coefficient stiffness matrix. 

(2) When the entire finite element model is analyzed by a direct conventional 
sparse solver, more computer ''paging” is required due to a larger problem 
size. 
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5. Conclusions 

_ A domain decomposition (DD) formulation for solving sparse linear systems of 
equations has been presented. The formulation incorporates lower level novel algo- 
rithmic ideas such as mixed direct/iterative sparse solvers, equation reordering, loop 
unrolling, efficient sparse assembly, and foward /backward solution phases that are op- 
timized to take full advantage of sparsity and exploit modern computer architecture. 
Medium to large-scale examples considered in this paper show that the developed 
MPI parallel DD code is efficient in both sequential and parallel computing environ- 
ments. Statistics show that software based on the formulation is significant^' more 
efficient than that based on one of the best available commercialized, parallel, direct 
sparse solver. 
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4. Parallel Finite Element Domain Decomposition [Refs. 6.6, 6.8] 
for Structural/Acoustic Analysis: Un_symmetrical Case 

The finite elelement DD formulation for the "unsymmetrical" matrix 
case is essentially the same as the "symmetrical" case, discussed 
in Section 3 of this report. Following are the major differences 
between these 2 cases: 

(a) Re-ordering algorithms (to minimize fill-in terms), sparse 
assembly strategies, and sparse solvers used in the "symmetrical" 
case need to be modified for the "unsymmetrical" case. 

(b) The Preconditioned Conjugate Gradient (PCG) iterative solver 
for solving the boundary displacement vector (for "symmetrical" 
case) needs to be replaced by the Preconditioned Bi-Conjugate 
Gradient (with stabilized schemes) [Ref. 6.7] for "unsymmetrical" 
case. 

For "multiple" processors computation, mixed iterative (BiCG 
with Stabilized strategies) and direct sparse, unsymmetrical 
sob ers are used. However, the BiCG unsymmetrical iterative 
solver seems to be "NOT robust” enough to solve the unsymmetrical 
finite element acoustic problems. Thus, further investigation on 
this topic is needed ! 

However, for a "single processor" (or serial computation) execution, 
since only the sparse, unsymmetrical "direct" solver is used, the 
obtained finite element accoustic results seem to be robust, and 
reliable. 

The Makefile, all source codes (including all .f, and all .c files), 
input data file ( = fort.501, with explanation ), output data file 
( = fort.700 ) etc... can be obtained from the Pi's (Prof. Nguyen's) 
ODU SUN account, at: 

cd ~/cee/dd_unsym_fem_complex_acoustic_willie/ 

Notice that the main program is stored under file name "cddmain.f '. 
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5. Summary of Accomplishments and Deliverable Items 


Years 1 and 2 (January 2001 - January 2003) 


(a) A first journal paper (on sparse assembly & solver) has been 
published [Ref. 6.1] 

(b) NASA LaRC Finite Element [Ref. 6.12] acoustic assembly time has 
been significantly reduced [refer to Table 1 of Section 2] 

(c) Symmetrical sparse re-ordering algorithms/sofware, such as 
Multiple Minimum Degree (MMD), Nested Disection (ND) [Ref. 6.4] 
have been incorporated into the finite element procedures. 

(d) Symmetrical sparse symbolic & numerical "assembly" software 
have been integrated into the finite element procedures. 

(e) Symmetrical sparse symbolic & numerical "factorization" software, 
including unrolling strategies [Ref. 6.5, Chapter 10] have been 
developed and integrated into the finite element procedures. 

(f) Symmetrical sparse "forward and backward solution” software, 
including unrolling strategies [Ref. 6.5, Chapter 10] have been 
developed and integrated into the finite element procedures. 

(g) A second journal paper (on Parallel FE, DD formulation) has been 
published [Ref. 6.2] 

(h) Parallel DD Preconditioned Conjugate Gradient (PCG) iterative 
solver (for symmetrical matrix case) software has been developed. 

REMARKS: 

METiS [Ref. 6.3] (a software which has the capabilities to 
minimize fill-in terms during the sparse factorization phase, 
and to automatically split a large finite element model into 
smaller sub-domains) has NOT yet been incorporated into the 
developed DD FE code. Instead, "hard coded" domain splitting 
has been used. 


Years 3 and 4 (January 2003 - January 2005) 
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(i) Un_symmetrical sparse symbolic & numerical "assembly" software 
have been integrated into the finite element procedures. 

(j) Un_symmetrical sparse symbolic & numerical "factorization” software, 
including unrolling strategies [Ref. 6.5, Chapter 10] have been 
developed and integrated into the finite element procedures. 

(k) Un_symmetrical sparse "forward and backward solution" software, 
including unrolling strategies have been 

developed and integrated into the finite element procedures. 

(l) METiS [Ref. 6.3] (a software which has the capabilities to 
minimize fill-in terms during the sparse factorization phase, 
and to automatically split a large finite element model into 
smaller sub-domains) has recently been incorporated into the 
developed DD FE code. 

(m) Parallel DD [Ref. 6.6] Preconditioned Bi_CG iterative solver 
[Ref. 6.7] (for un_symmetrical matrix case) software has been 
developed. 

However, the Bi_CG iterative solver (with DD formulation) 
and its communication strategies (amongst different processors) 
need further studies to improve the robustness of the 
algorithms ! 


REMARKS: 

"Extra" works to investigate the possibilities of using 
the software MA28 [see Ref. 6.10], seperating the factorized 
and forward/backward phases in the sparse solver package 
SUPER-LU [see Ref. 6.9] etc... have also been conducted 
by Dr. Nguyen's (Pi's) research team at Old Dominon University 
(ODU) during the grant periods. 
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